Speech enhancement using voice source models
نویسندگان
چکیده
Autoregressive (AR) models have been shown to be effective models of speech signal. However, although it is the most common mode1 of speech, an AR process excited by white noise for speech enhancement, fails to capture the effects of source excitation, especidy the quasi periodic nature of voiced speech. Speech synthesis researchers have long recognized this ~roblern and have developed a variety of sophisticated excitation models. Such models have yet to make an impact in speech enhancement. We have concentrated our research on mod+g the conventional white noise excited AR model for various speech classes and on establishing performance benchmarks by studying speech-enhancement, using the proposed models, in det ail for individual phonemes under arbitrarily well-characterized circums tances. We have proposed three different types of impulsive excitation models for an AR model for various phoneme classes based on the type of excitation with which each class is associated. For voiced speech, the &ect of the glottal excitation is simulated by a train of impulses spaced according to pitch periods. For unvoiced stops and unvoiced afEicates, the excitation source is modeled by a single impulse marking the instant of the onset of the burst and a white noise term. For voiced stops and voiced atfncates, a mixed excitation of the plosive driving term and a quasi-periodic train of impulses are used. For voiced fricatives a mixed excitation of white noise and a quasi-periodic train of impulses separated by pitch periods is used. In each case, impulsive AR models outperformed th& white-noise-driven counterparts. The success of the tentative impulsive excitation models has motivated us towards applying a more sophisticated excitation model. We have chosen one of the mos t common excitation source models, the four-parameter model of Fant , Liljencrants and Lin[l], which is also known as an LF model and applied it to the enhancement of individual voiced phonemes. We have proposed a novel two step op timization algorithm for estimating the parameters for an LF model. Among the AR models with three different types of excitation models (a conventional whitenoise excitation, an impulsive excitation and an LF model), the LF excitation model yields the best performance in speech enhuicement in terms of the output signal-to-noise ratios (SNRs).
منابع مشابه
Speech Enhancement Using Gaussian Mixture Models, Explicit Bayesian Estimation and Wiener Filtering
Gaussian Mixture Models (GMMs) of power spectral densities of speech and noise are used with explicit Bayesian estimations in Wiener filtering of noisy speech. No assumption is made on the nature or stationarity of the noise. No voice activity detection (VAD) or any other means is employed to estimate the input SNR. The GMM mean vectors are used to form sets of over-determined system of equatio...
متن کاملVoice Processing by Dynamic Glottal Models with Applications to Speech Enhancement
We discuss the use of low-dimensional physical models of the voice source for speech coding and processing applications. A class of waveform-adaptive dynamic glottal models and parameter tracking procedures are illustrated. The model and analysis procedures are assessed by addressing speech encoding and enhancement, achievable by using a state space version of the dynamical model in a Extended ...
متن کاملCan we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data
Thanks to the growing availability of spoofing databases and rapid advances in using them, systems for detecting voice spoofing attacks are becoming more and more capable, and error rates close to zero are being reached for the ASVspoof2015 database. However, speech synthesis and voice conversion paradigms that are not considered in the ASVspoof2015 database are appearing. Such examples include...
متن کاملUsing Context-based Statistical Models to Promote the Quality of Voice Conversion Systems
This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...
متن کاملUsing the Glottal Source in Voice Technology Applications
From artificial voices in GPS to automatic systems of dictation, from voice-based identity verification to voice pathology detection, speech processing applications are nowadays omnipresent in our daily life. By offering solutions to companies seeking for efficiency enhancement with simultaneous cost saving, the market of speech technology is forecast to be particularly promising in the next ye...
متن کاملA New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999